An Adaptive Injection Model for Pansharpening

Pansharpening technology is used to acquire a multispectral image with high spatial resolution from a panchromatic (PAN) image and a multispectral (MS) image. The detail injection model is popular for its flexibility. However, the accuracy of the injection gain and the extracted details may greatly influence the quality of the pansharpened image. This paper proposes an adaptive injection model to solve these problems. For detail extraction, we present a Gaussian filter estimation algorithm by exploring the intrinsic character of the MS sensor and convolving the PAN image with the filter to adaptively optimize the details to be consistent with the character of the MS image. For the adaptive injection coefficient, we iteratively adjust the coefficient by balancing the spectral and spatial fidelity. By multiplying the optimized details and injection gain, the final HRMS is obtained with the injection model. The performance of the proposed model is analyzed and a large number of tests are carried out on various satellite datasets. Compared to some advanced pansharpening methods, the results prove that our method can achieve the best fusion quality both subjectively and objectively.


Introduction
Remote sensing aims at extracting the land information of the Earth's surface using satellites. Owing to the disadvantage of sensor technology, acquiring a high-resolution remote sensing image in both spatial and spectral domains is still difcult [1]. Terefore, image processing technology is usually adopted to fuse the existing two kinds of images, namely, the panchromatic (PAN) image which has a high spatial resolution but only one channel, and the multispectral (MS) image which has a low spatial resolution but high spectral resolution. For the MS and PAN images which are obtained simultaneously in the same area, they can be combined by appropriate algorithms to generate a fused image with high resolution in both spectral and spatial aspects. Tis process is also called PAN sharpening [2]. Pansharpened images are widely used in geography, military, agriculture, ecological environment, and other felds [3,4].
Te existing traditional fusion methods are usually classifed into two types. Te frst one is component substitution (CS), which replaces the spatial components of MS images with a PAN image. Popular methods of this kind include adaptive Gram-Schmidt (GSA) [5], the principal component analysis (PCA) [6], and the intensity-hue-saturation (IHS) methods [7]. Te fusion result based on the CS method has high spatial resolution but easily sufers from spectral distortion [2,8].
Te other kind is multiresolution analysis (MRA), which adds extracted details to MS images through the multiscale decomposition of PAN images. Traditional MRA methods mainly include methods based on pyramid decomposition or wavelet transform. For instance, theàtrous wavelet transform [9], discrete wavelet transform [10], and the contour wave transform [11] all belongs to this category. A nonlinear approach called morphological flters (MF) [12] has also been successfully applied in the pansharpening feld. By exploring the modulation transfer function (MTF) of the MS sensor, a generalized Laplacian pyramid (GLP) based on the MTF flter; i.e., MTF-GLP [13] is presented to adaptively inject the details. To better preserve texture information in the fused image, Yang et al. [14] proposed a multiscale decomposition method based on guided fltering, and the efect is improved. Recently, the popular additive wavelet luminance proportional (AWLP) was enhanced in [15] by taking into account the acquisition device and optical radiation transfer process. Te MRA method can better retain the spectral features. However, some spatial information may be lost [4,16].
In recent years, variational optimization (VO)-based fusion methods have attracted the attention of many researchers. Tis kind of method usually constructs an estimation model based on the assumption of spectral and spatial fdelity between the fused image and source images. Ten, a TV regularizer is proposed in [17] and applied to each band of the HRMS images. Later, Palsson et al. [18] considered a spectral reduced-rank relationship between the PAN and HRMS images and used the l2 regularizer to constrain the model. Other classes of methods such as sparse representation-based [19] and Bayesian-based [20] are also included in the VO-based methods. Tese methods usually can better balance the spectral and spatial quality than the traditional methods. However, they require much more computational resources [21]. Besides, the majority of the previously discussed VO methods rely on the regularization parameters, which need to be determined manually and may afect the accuracy of the model [22].
Deep learning (DL), one of the hottest research areas in artifcial intelligence, has produced outstanding outcomes in the fusion of remote-sensing images. Typically DL-based networks include autoencoders (AEs) [23] and convolutional neural networks (CNNs) [24]. To better preserve spectral and spatial details, Yang et al. [16] designed an endto-end depth residual network-based fusion method to automatically learn a mapping from remote sensing images, which has achieved good results. Recently, some unsupervised DL-based methods have been developed based on the intrinsic properties of the source images. For example, Liu et al. [25] frst applied the generative adversarial network (GAN) in pansharpening and proposed PSGAN, and then Ciotola et al. [26] proposed a general unsupervised network in the full-resolution framework, thus making full use of the original information and reducing the spectral distortion. However, these methods require a lot of computation, long running time, and numerous samples [27,28]. Owing to the limitation of the number of image samples and the requirement of algorithm efciency, this paper comprehensively considers the abovementioned methods and optimizes the injection model based on the MRA method to achieve dual fdelity of spectral information and detailed information during fusion.
Te key step in the MRA-based method is how to extract the details of the PAN image. A more advanced method is to use a linear time-invariant flter to match the point spread function (PSF) of the MS sensor. Aiazzi et al. [29] argued that it is better to use a flter to simulate the MTF of the MS sensor, but it requires using the factory information of the MS sensor, and it is not easy to obtain. Another more reliable method is to estimate the flter from the existing image to simulate the PSF of the MS sensor. In this framework, a lowspatial-resolution MS (LRMS) image can be obtained by spatially fltering a high-spatial-resolution MS (HRMS) image. Te impulse response function of this flter can simulate the point spread function of the MS sensor, and the flter usually has a shape similar to the Gaussian model [30,31]. Using this flter to convolve the PAN image, the resulting high-frequency parts are the missing detail components in the HRMS image and are highly linearly correlated with the MS image.
Considering spectral fdelity and detail injection, this work proposes a pansharpening method based on Gaussian flter estimation and adaptive detail injection. Tis method frst obtains the initial fused image by the guided fltering injection model of multiscale decomposition, and then simulates the PSF of the MS sensor with Gaussian fltering and convolving the PAN image with the estimated Gaussian flter to obtain the optimized injected details. Ten, the adaptive fusion coefcient is calculated recursively, so that the spectral fdelity and detail injection can be jointly optimal. Te main innovations of our work are shown as follows: (1) An adaptive injection model is proposed to realize the dual fdelity of spectral and spatial information (2) A detail extraction method based on Gaussian flter estimation is proposed to optimize the injected details (3) An adaptive fusion coefcient is designed to automatically optimize the volume of the injected details

General Framework of Injection
Model. Te injection model based on the MRA method refers to a fusion method in which the injection details are extracted by fltering the source images, and then the details are added to the upsampled MS image with the injection coefcients. Te method can combine diferent pansharpening technologies according to the actual need, to exert the advantages of diferent fusion technologies. Te general representation of the injection model is as follows [24]: where k � 1,2, . . ., N represents the subchannel k in the N channels, MS k represents the estimated HRMS image of the k-th channel, MS k denotes the k-th channel after bicubic upsampling, G k represents the injection coefcient of the k-th channel, P I represents the PAN image after histogram equalization of the I component of the MS image, and P L represents a low-resolution version of the PAN image. For the injection coefcient, Vivone et al. [32] propose an interchannel weight coefcient based on a regression model through an iterative algorithm. Yin and Li [33] propose a weight coefcient based on the ratio between pixel channels. Vivone et al. [34] use the high-pass modulation (HPM) method with the multiplicative combination of the source images as the injection gain. Among them, the injection coefcient using the ratio between channels can better preserve the spectral information, and the fusion result is better. Te injection coefcient between channels is defned as follows: 2 Computational Intelligence and Neuroscience (2)

Extraction of Intensity Component.
Since it is not desirable to change the chroma and saturation while injecting details, only the intensity (I) component of the image is often processed in the image fusion process. Te I component can be calculated using the following formula: where α k represents the combination coefcient of the channel. Te authors in [35,36] proposed that the intensity can be calculated using the linear combination of the bands, that is, the combination coefcients are usually acquired by solving the optimization problem as follows: where P represents the PAN image. After obtaining the I component of the MS image, perform histogram matching between P and I in the above formula, and obtain P I using the following equation: where u P represents the mean of P, u I represents the mean of the I component, σ I represents the standard deviation of the I component; σ P represents the standard deviation of P.

Guided Filtering.
Guided fltering was frst proposed by [24], which can save the main feature of the input image and obtain the changing trend of the guided image. Taking P I as the input image and I as the guide image, P I is generated by intensity-guided fltering to generate P L , and detailed information linearly related to the MS map is obtained. Guided fltering can be obtained by the following simplifed formula: where GF(·) represents the guided flter function. Multiscale-guided fltering can be expressed as follows [13]: where j represents the number of guided fltering layers; P j L represents the image obtained by P I through the j-th layer of guided fltering.

Te Proposed Framework.
Tere are two main defciencies in the current injection model. For one thing, there is a low correlation between MS and PAN images, which may cause spectral distortion. On the other hand, there may be over-detail injection or insufcient injection. Aiming at the two problems, this work proposes a method for detail extraction and adaptive injection coefcient optimization based on Gaussian flter estimation. Te framework of this method is shown in Figure 1.
Firstly, according to formula (1), the source images are used to obtain an initial fusion image through traditional injection models such as histogram matching and guided fltering.
Secondly, we proposed the detail optimization method, that is, fltering the initial fusion image with a multiscale Gaussian flter to simulate the characteristics of the MS sensor, and fltering the PAN image with the obtained Gaussian flter estimation to obtain the optimized details.
Tird, we optimize the injection gain, that is, comprehensively consider the spectral information and the detail information to calculate the adaptive injection amount coefcient. Te specifc process is shown in the dotted box in Figure 1, and the detailed algorithm is shown in Section 3.2.
Finally, the optimized details are multiplied by the injection coefcient, and then the fnal HRMS image is acquired by adding the optimized details to the upsampled MS image.

Gaussian Filter Estimation and Detail Extraction.
As mentioned above, Reference [31] proposed to use the Gaussian flter to simulate the MTF of the MS sensor; that is, the HRMS image is regarded as the LRMS image after passing through the MS sensor. A Gaussian flter is used to simulate this process, and the resulting Gaussian flter is a fltered estimate conforming to the PSF of the MS sensor. Next, the PAN image is convolved with the estimated flter, and the extracted detail components are those missing from the low-resolution MS image. Yang et al. [37] proved that the flter estimation has strong robustness through experiments. Even if white noise is added to the initial fusion image, it does not afect the extraction of details. In addition, Yin and Li [33] proposed a two-step multiscale decomposition method to refne PAN sharpening. Inspired by this, this paper proposes to use multiscale Gaussian fltering to simulate the PSF of the MS sensor, that is, extract the luminance component from the initial fusion image F (1) , denoted as I 1 , and perform Gaussian fltering iteratively, which can be expressed as follows: where I i 1 represents the image after the i th fltering and H G (·) represents the Gaussian flter. To obtain the best Gaussian flter estimation, the correlation coefcient between the result I i 1 obtained by formula (8) and the I component of the upsampled MS image is calculated. When the correlation coefcient is the highest, the estimated Gaussian flter is the most accurate, and the correlation coefcient is calculated as follows: Computational Intelligence and Neuroscience where cov (·) represents the covariance function and std (·) represents the standard deviation function. As the fltering level increases, the fltered image and the up-sampled MS image get closer and closer, and after reaching the maximum value, as the fltering level continues to increase, the fltered image and the upsampled MS image deviate increasingly. By iterative calculation of the values of formulas (8) and (9), the number of iterations m is obtained when formula (9) is maximum, which is the estimated Gaussian flter H m , which is expressed as follows: where H m means the best match of the PSF of a sharpened image blurred by the MS sensor. Te flter H m is applied to the PAN image, and the extracted detail components should also conform to the MS sensor PSF, thus reducing the spectral distortion, and the detail extraction D defned as follows: Combining formulas (1), (7), and (11), after optimizing the details, the k th channel fusion image F (2) k can be updated to the following form:

Adaptive Injection Coefcient.
After obtaining the optimized injection details, this paper optimizes the injection coefcient. Diferent images have diferent structural features and spectral information and thus have diferent requirements for the amount of detail injection. Directly injecting details using the same gain can easily lead to too much injection to cause spectral distortion or insufcient injection to cause image blur. Terefore, it is necessary to retain the spectral and detail information through an injection amount coefcient g. After adding this coefcient, combined with formula (12), the new fusion image F (3) k can be defned as follows: We need to determine diferent coefcients g for remote sensing images with diferent structural features. Te author in [38] believes that the requirements for spectral fdelity and spatial resolution in the injection model need to be balanced, and the weighted sum of the two evaluation indicators can be used as the fusion image indicator, and the weight indicates the degree of importance. We set the initial value g 0 of the detail injection coefcient g and the step size r. For g 0 ≤ g ≤ 1, the corresponding fused image is obtained and the weighted evaluation index is calculated and obtain the g max that makes the evaluation index the highest as the fnal injection coeffcient. For the evaluation index, this paper proposes a comprehensive evaluation system of spectral information and spatial information based on correlation analysis. Te weighted evaluation index Q for spectral information and spatial information is defned as follows: where E SP and E HF represent the spectral information evaluation index and the spatial information evaluation index, respectively; α represents the weight of the two. For an image F (3) k fused by formula (13)  correlation between the fused image and each channel k of the upsampled MS image, and E HF is represented by the linear correlation between I 3 and PAN images, which is defned as follows: Te authors in [28] show that the optimal determination of the injection components and the injection amount g follows the premise that if there is a higher correlation between I and PAN images, there will be a better fusion result. Tat is, if the correlation between the two is high, the spectral information can be well preserved when details are injected. At this time, the weight of the spatial feature can be increased and the weight of the spectral feature can be reduced. Conversely, if the correlation is low, the injected details are likely to cause spectral distortion. Te spectral information weight should be increased, and the spatial information weight should be decreased.
Assuming that the I component corresponding to the fused image generated by the initial value g 0 is denoted as I 0 , the weight α in formula (14) can be determined by the following formula: Te design of the square term of (16) has two reasons. Te frst is to avoid that the spectral weight is too small when I 0 and P I are highly correlated; the second is to magnify the diference between the spatial information of diferent images.
For diferent injection coefcients g, iteratively calculating F (3) k according to formula (13), and calculating the evaluation index Q according to formula (14), g max with the highest index Q can be obtained as the injection coefcient of the fnal fused image. g max can be expressed as follows: Note that g max can be diferent according to the characteristics of the image, such as the image dimension and the type of land cover. Combining formulas (13) and (17), after using the optimized injection coefcient, the fnal fused image can be updated as

Experimental Setup.
To evaluate the performance of our method, we use two datasets from the satellites, including QuickBird and IKONOS. Tese remote sensing images have diferent features in spectral wavelength, spatial resolution, etc., and all include four bands such as red, green, blue, and near-infrared.
For parameter setting, this paper sets the initial value of the injection volume coefcient g 0 � 0.1 and the step size r � 0.05, to thoroughly consider the efciency and precision of the calculation. Te Gaussian flter window is set to 5 × 5, the default setting in reference [22].
Tis paper carried out two kinds of experiments. One is the simulated image after down-sampling the real image, also known as the reduced-scale (RS) experiment, that is, following the Wald protocol [39], and using the original MS image as a reference image. Te simulated images contain a total of 100 sets of images from the datasets with 50 sets in each dataset. Te other is the real image experiment, also known as the full-scale (FS) experiment. Te real images contain the same number of images as the simulated images.

Performance Analysis.
To verify the efectiveness of the optimization method proposed in Sections 3.2 and 3.3 of this paper and quantitatively analyze the performance improvement achieved by the proposed method, we calculate the average objective indicators of each optimization step. Te fused images corresponding to the three optimization steps are expressed as F1 representing the initial fusion image, F2 representing the image after Gaussian fltering to optimize the details, and F3 representing the image after adaptively optimizing the injected coefcients. Te average objective index of 80 groups of image fusion results is calculated, respectively. To more intuitively show the relative changes of each indicator, each objective indicator is normalized, that is, for the result X(i) of an indicator X, the normalization is X(i)/(max (X(i))), the results are shown in Figure 2. We can see that for the initial fusion image F1, the injected details are obtained using the classical MRA-based method. After optimizing the details through the estimated Gaussian flter, the obtained details have a higher correlation with the MS image, and various indicators are signifcantly improved; after optimizing the injection coefcient, it can efectively prevent overinjection or underinjection, and all indicators are further improved. With the sequential optimization steps of F2 and F3, the overall indicators are getting better and better. Among them, the SAM index is basically stable. With optimization, the SAM index frst increases slightly and then decreases slightly, showing that the proposed method has little impact on the loss of spectral information.    Figures 3 and 4, respectively. Te smaller red rectangle is enlarged and displayed in a larger rectangle. From Figure 3, we can see that the results of AWLP, CBD, MTF-GLP, and MF methods generate obvious spectral distortion in the forest area, and the color of them is brighter than the reference image; the results of BFLP and LRFF have slight spectral distortion, and there is an overinjection problem which introduces unnecessary noise. Te result of ASIMP turns into brown in the forest area. Te proposed method is relatively close to the reference image in the spectral information of the beach and forest. Te fusion results in Figure 4 have exhibited a situation similar to those in Figure 3. Te results of comparison methods show diferent degrees of spectral distortion, and the result of the proposed method is closest to the reference image.

Computational Intelligence and Neuroscience
To objectively evaluate the performance of each comparison method, the average objective metrics were calculated after the fusion of 50 sets of remote sensing images from each dataset. Te results are shown in Table 2. We can see from the table that our method achieves the best results on all average metrics on both datasets.     Figure 6, we can see that the results of the BFLP, MF, and LRFF methods produce apparent spectral distortion in the orange roof area. Since there is no reference image, it is difcult to directly distinguish other methods from the proposed method visually. We mainly refer to quantitative evaluation.
To objectively evaluate the efectiveness of each compared method, this paper conducts experiments on 50 sets of real images from each dataset and calculates the average objective indicators as shown in Table 3. Te table shows that except for D s in the QuickBird dataset, all the metrics of the proposed method are optimal, further verifying the good performance of our method.

Conclusion
In the fusion of remote sensing images, there has always been a key problem on how to keep the spectral information fdelity while injecting details. In this paper, a novel adaptive injection-based pansharpening method is proposed, which frst uses the traditional injection model as the initial fusion image and then optimizes the injection details and injection gain. Gaussian flter estimation is used to simulate the characteristics of the MS sensor in the process of optimizing the injected details, and the PAN image is deconvolved with the estimated Gaussian flter to obtain the optimized details. To optimize the injection beneft, the spectral information and detailed information are comprehensively considered, and a weighted evaluation index is established to determine the adaptive injection amount coefcient. Te fusion experiments are conducted on 100 sets of simulated images and real images from the IKONOS and QuickBird datasets. Compared to some advanced fusion methods, the qualitative and average quantitative evaluations show that our method performs better than all other comparison methods.

Data Availability
Te datasets can be obtained from https://www.kosmosimagemall.com/.

Conflicts of Interest
Te authors declare that they have no conficts of interest.  0.0188 0.0458 0. 370 0.0331 0.0540 0. 153 Te bold values indicate that they are optimal.
Computational Intelligence and Neuroscience 9